Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

feat: implement document loader & transformer for store indexing #343

Merged
merged 1 commit into from
Jun 30, 2025

Conversation

chr-hertel
Copy link
Member

No description provided.

@chr-hertel chr-hertel requested a review from Copilot June 26, 2025 13:59
Copilot

This comment was marked as outdated.

@chr-hertel chr-hertel force-pushed the feat-loader-transformer-pipeline branch from 7e27551 to d97f44e Compare June 26, 2025 14:08
@chr-hertel chr-hertel changed the title feat: implement document loader & transformer pipeline for store indexing feat: implement document loader & transformer for store indexing Jun 26, 2025
@chr-hertel chr-hertel marked this pull request as draft June 26, 2025 14:54
@chr-hertel chr-hertel force-pushed the feat-loader-transformer-pipeline branch from d97f44e to c76251f Compare June 26, 2025 14:59
@OskarStark OskarStark requested a review from Copilot June 26, 2025 15:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a document loader and transformer for store indexing while refactoring the Indexer to use batch processing with the new Vectorizer and adding several tests for the document splitting and loading functionalities.

  • Refactored Indexer to remove clock-based sleep logic and integrate batch vectorization via the Vectorizer.
  • Added comprehensive tests for TextSplitTransformer, ChainTransformer, and TextFileLoader.
  • Introduced new loader and transformer interfaces and implementations.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/Store/IndexerTest.php Renamed test methods and removed the dependency on MockClock.
tests/Store/Document/Transformer/TextSplitTransformerTest.php Added tests to validate document splitting behavior with various parameters.
tests/Store/Document/Transformer/ChainTransformerTest.php Added tests to ensure chaining of document transformers works as expected.
tests/Store/Document/Loader/TextFileLoaderTest.php Added tests for handling valid/invalid file sources and metadata inclusion.
src/Store/Indexer.php Refactored to incorporate batch vectorization using Vectorizer but introduced a potential issue with processing remaining chunks.
src/Store/Document/Vectorizer.php Introduced a separate component to handle document vectorization.
src/Store/Document/Transformer/* New transformer components (TextSplit, ChunkDelay, ChainTransformer) added.
src/Store/Document/Loader/* New TextFileLoader and LoaderInterface implementations added.
examples/store/document-splitting.php Added example demonstrating document splitting usage.

@chr-hertel chr-hertel force-pushed the feat-loader-transformer-pipeline branch 3 times, most recently from 8f5c001 to 6e484a2 Compare June 27, 2025 23:06
@chr-hertel chr-hertel marked this pull request as ready for review June 27, 2025 23:07
@chr-hertel chr-hertel requested a review from OskarStark June 27, 2025 23:08
@chr-hertel chr-hertel force-pushed the feat-loader-transformer-pipeline branch from 6e484a2 to 7e2d5ba Compare June 30, 2025 21:14
@chr-hertel chr-hertel merged commit 0f014e2 into main Jun 30, 2025
7 checks passed
@chr-hertel chr-hertel deleted the feat-loader-transformer-pipeline branch June 30, 2025 21:16
chr-hertel added a commit to symfony/ai that referenced this pull request Jul 4, 2025
…ndexing (chr-hertel)

This PR was merged into the main branch.

Discussion
----------

feat: implement document loader & transformer for store indexing

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| Docs?         |
| Issues        |
| License       | MIT

Cherry picking php-llm/llm-chain#343

Commits
-------

83ce86f feat: implement document loader & transformer for store indexing (#343)
@chr-hertel chr-hertel added the BC BREAK Backwards compatibility break label Jul 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
BC BREAK Backwards compatibility break feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants